LASSO - TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH - DIMENSIONAL DATA By Nicolai Meinshausen and Bin
نویسندگان
چکیده
UC Berkeley The Lasso is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables pn is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the so-called irrepresentable condition. The latter condition can easily be violated in the presence of highly correlated variables. Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the `2-norm sense for fixed designs under conditions on (a) the number sn of non-zero components of the vector βn and (b) the minimal singular values of the design matrices that are induced by selecting small subsets of variables. Furthermore, a (nearly) optimal rate of convergence is obtained on the `2 error with an appropriate ∗We would like to thank Noureddine El Karoui and Debashis Paul for pointing out interesting connections to Random Matrix theory. Some results of this manuscript have been presented at the Oberwolfach workshop ‘Qualitative Assumptions and Regularization for High-Dimensional Data’. Nicolai Meinshausen is supported by DFG (Deutsche Forschungsgemeinschaft) and Bin Yu is partially supported by a Guggenheim fellowship and grants NSF DMS-0605165 (06-08), NSF DMS-03036508 (03-05) and ARO W911NF05-1-0104 (05-07). Finally, we would like to thank the two referees and the AE for their helpful comments that have led to an improvement over our previous results.
منابع مشابه
High-dimensional Graphs and Variable Selection with the Lasso by Nicolai Meinshausen
The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensio...
متن کاملRelaxed Lasso
The Lasso is an attractive regularisation method for high dimensional regression. It combines variable selection with an efficient computational procedure. However, the rate of convergence of the Lasso is slow for some sparse high dimensional data, where the number of predictor variables is growing fast with the number of observations. Moreover, many noise variables are selected if the estimato...
متن کاملHigh Dimensional Graphs and Variable Selection with the Lasso
The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensio...
متن کاملLASSO ISOtone for High Dimensional Additive Isotonic Regression
Additive isotonic regression attempts to determine the relationship between a multi-dimensional observation variable and a response, under the constraint that the estimate is the additive sum of univariate component effects that are monotonically increasing. In this article, we present a new method for such regression called LASSO Isotone (LISO). LISO adapts ideas from sparse linear modelling t...
متن کاملMammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease
Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007